Search CORE

34 research outputs found

Computing Palindromes on a Trie in Linear Time

Author: Funakoshi Mitsuru
Inenaga Shunsuke
Mieno Takuya
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Algorithms and Computation (ISAAC 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

Shortest Unique Substring Queries on Run-Length Encoded Strings

Author: Bannai Hideo
Inenaga Shunsuke
Mieno Takuya
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)
Publication date: 01/01/2016
Field of study

We consider the problem of answering shortest unique substring (SUS) queries on run-length encoded strings. For a string S, a unique substring u = S[i..j] is said to be a shortest unique substring (SUS) of S containing an interval [s, t] (i j\u27-i\u27, S[i\u27..j\u27] occurs at least twice in S. Given a run-length encoding of size m of a string of length N, we show that we can construct a data structure of size O(m+pi_s(N, m)) in O(m log m + pi_c(N, m)) time such that queries can be answered in O(pi_q(N, m) + k) time, where k is the size of the output (the number of SUSs), and pi_s(N,m), pi_c(N,m), pi_q(N,m) are, respectively, the size, construction time, and query time for a predecessor/successor query data structure of m elements for the universe of [1,N]. Using the data structure by Beam and Fich (JCSS 2002), this results in a data structure of O(m) space that is constructed in O(m log m) time, and answers queries in O(sqrt(log m/loglog m)+k) time

Dagstuhl Research Online Publication Server

Tight Bounds on the Maximum Number of Shortest Unique Substrings

Author: Bannai Hideo
Inenaga Shunsuke
Mieno Takuya
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s <= t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Sliding suffix trees simplified

Author: Bannai Hideo
Inenaga Shunsuke
Leonard Laurentius
Mieno Takuya
Publication venue
Publication date: 03/07/2023
Field of study

Sliding suffix trees (Fiala & Greene, 1989) for an input text

T

over an alphabet of size

\sigma

and a sliding window

W

T

can be maintained in

O(|T| \log \sigma)

time and

O(|W|)

space. The two previous approaches that achieve this can be categorized into the credit-based approach of Fiala and Greene (1989) and Larsson (1996, 1999), or the batch-based approach proposed by Senft (2005). Brodnik and Jekovec (2018) showed that the sliding suffix tree can be supplemented with leaf pointers in order to find all occurrences of an online query pattern in the current window, and that leaf pointers can be maintained by credit-based arguments as well. The main difficulty in the credit-based approach is in the maintenance of index-pairs that represent each edge. In this paper, we show that valid edge index-pairs can be derived in constant time from leaf pointers, thus reducing the maintenance of edge index-pairs to the maintenance of leaf pointers. We further propose a new simple method which maintains leaf pointers without using credit-based arguments. Our algorithm and proof of correctness are much simpler compared to the credit-based approach, whose analyses were initially flawed (Senft 2005).Comment: 12 pages + 5 pages of appendix. 18 figures in tota

arXiv.org e-Print Archive

Finding Top-k Longest Palindromes in Substrings

Author: Horiyama Takashi
Mieno Takuya
Mitani Kazuki
Seto Kazuhisa
Publication venue
Publication date: 17/06/2023
Field of study

Palindromes are strings that read the same forward and backward. Problems of computing palindromic structures in strings have been studied for many years with a motivation of their application to biology. The longest palindrome problem is one of the most important and classical problems regarding palindromic structures, that is, to compute the longest palindrome appearing in a string

T

of length

n

. The problem can be solved in

O(n)

time by the famous algorithm of Manacher [Journal of the ACM, 1975]. This paper generalizes the longest palindrome problem to the problem of finding top-

k

longest palindromes in an arbitrary substring, including the input string

T

itself. The internal top-

k

longest palindrome query is, given a substring

T[i..j]

T

and a positive integer

k

as a query, to compute the top-

k

longest palindromes appearing in

T[i.. j]

. This paper proposes a linear-size data structure that can answer internal top-

k

longest palindromes query in optimal

O(k)

time. Also, given the input string

T

, our data structure can be constructed in

O(n\log n)

time. For

k = 1

, the construction time is reduced to

O(n)

arXiv.org e-Print Archive

String Sanitization Under Edit Distance: Improved and Generalized

Author: Mieno Takuya
Pissis Solon,
Stougie Leen
Sweering Michelle
Publication venue: HAL CCSD
Publication date: 05/07/2021
Field of study

International audienceLet W be a string of length n over an alphabet Σ, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string X ED such that: (i) no string of S occurs in X ED ; (ii) the order of all other length-k substrings over Σ is the same in W and in X ED ; and (iii) X ED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential patterns, the ETFS problem asks for transforming W to preserve its privacy and its utility [Bernardini et al., ECML PKDD 2019]. ETFS can be solved in O(n 2 k) time [Bernardini et al., CPM 2020]. The same paper shows that ETFS cannot be solved in O(n 2−δ) time, for any δ > 0, unless the Strong Exponential Time Hypothesis (SETH) is false. Our main results can be summarized as follows: • An O(n 2 log 2 k)-time algorithm to solve ETFS. • An O(n 2 log 2 n)-time algorithm to solve AETFS, a generalization of ETFS in which the elements of S can have arbitrary lengths

INRIA a CCSD electronic archive server